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We  describe  a  new  fault-tolerant  algorithm  for  solving  a  variant  of  Lamport’s  clock  synchronization 
problem.  The  algorithm  is  designed  for  a  system  of  distributed  processes  that  communicate  by 
sending  messages.  Each  process  has  its  own  read-only  physical  clock  whose  drift  rate  from  real  time 
is  very  small.  By  adding  a  value  to  its  physical  clock  time,  the  process  obtains  its  local  time.  The 
algorithm  solves  the  problem  of  maintaining  closely  synchronized  local  times,  assuming  that 
processes’  local  times  are  closely  synchronized  initially.  The  algorithm  is  able  to  tolerate  the  failure  of 
just  under  a  third  of  the  participating  processes.  It  maintains  synchronization  to  within  a  small 
constant,  whose  magnitude  depends  upon  the  rate  of  clock  drift,  the  message  delivery  time,  and  the 
initial  closeness  of  synchronization.  We  also  give  a  characterization  of  how  far  the  clocks  drift  from 
real  time.  Reintegration  of  a  repaired  process  can  be  accomplished  using  a  slight  modification  of  the 
basic  algorithm.  A  similar  style  algorithm  can  also  be  used  to  achieve  synchronization  initially. 
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1 .  Introduction 

Keeping  the  tocal  times  of  processes  in  a  distributed  system  synchronized  in  the  presence  of 
arbitrary  faults  is  important  in  many  applications  and  is  an  interesting  problem  in  its  own  right.  Taking 
into  account  the  clocks’  drift  from  real  time  and  varying  message  delivery  times  makes  the  problem 
more  realistic  and  more  challenging.  In  order  to  be  truly  useful,  a  solution  to  this  problem  must  allow 
faulty  processes  that  have  recovered  to  be  reintegrated  into  the  system.  The  algorithm  described  in 
this  paper  meets  these  requirements,  assuming  that  the  clocks  are  initially  close  together  and  that 
fewer  than  one  third  of  the  processes  are  faulty. 

In  our  model,  processes  are  assumed  to  have  access  to  local  read-only  physical  clocks,  which  are 
subject  to  a  very  small  rate  of  drift.  A  process'  local  time  is  obtained  by  adding  the  value  of  the 
physical  clock  to  the  value  of  a  local  "correction"  variable.  We  assume  that  processes  are  totally 
connected  for  communication.  They  communicate  by  messages,  over  a  reliable  transmission 
medium.  There  are  upper  and  lower  bounds  on  the  length  of  time  that  any  message  takes  to  arrive  at 
its  destination.  We  do  not  require  the  existence  of  unforgeable  signatures. 

Our  algorithm  runs  in  rounds,  resynchronizing  every  so  often  to  correct  for  the  clocks  drifting  out  of 
synchrony,  and  using  a  fault-tolerant  averaging  function  based  on  those  in  [DLPSW]  to  calculate  an 
adjustment.  The  size  of  the  adjustment  made  to  a  clock  at  each  round  is  independent  of  the  number 
of  faulty  processes.  At  each  round,  n2  messages  are  required,  where  n  is  the  total  number  of 
processes.  The  closeness  of  synchronization  achieved  depends  only  on  the  initial  closeness  of 
synchronization,  the  message  delivery  time  and  its  uncertainty,  and  the  drift  rate.  Since  the  closeness 
of  synchronization  depends  on  the  initial  closeness,  this  is,  in  the  terminology  of  [LM],  an  interactive 
convergence  algorithm.  We  give  explicit  bounds  on  how  the  difference  between  the  clock  values  and 
real  time  grows.  The  algorithm  can  be  easily  adapted  to  become  a  reintegration  procedure  for 
repaired  processes. 

Lamport  and  Melliar-Smith  [LM],  Halpern,  Simons  and  Strong  [HSS],  and  Marzullo  [M]  also  have 
clock  synchronization  algorithms  that  run  in  rounds.  The  three  algorithms  in  [LM],  as  do  ours,  require 
a  reliable,  completely  connected  communication  network  and  handle  arbitrary  faults.  However,  the 
closeness  of  synchronization  achieved  by  one  depends  on  the  number  of  processes  and  that 
achieved  by  the  other  two  depends  on  the  number  of  faulty  processes.  In  two  of  them,  the  size  of  the 
adjustment  also  depends  on  the  number  of  faulty  processes  and  the  number  of  messages  is 
exponential.  Although  one  algorithm  only  needs  a  majority  of  the  processes  to  be  nonfaulty,  it 
assumes  unforgeable  digital  signatures.  The  algorithm  of  [HSS]  is  resilient  to  any  number  of  faults  (as 
long  as  the  network  remains  connected),  has  n2  message  complexity  per  round,  and  achieves  a 
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closeness  of  synchronization  very  similar  to  ours.  But  the  size  of  the  adjustment  depends  on  the 
number  of  processes  and  unforgeable  digital  signatures  are  necessary.  The  framework  and  error 
model  used  in  [M]  make  a  direct  comparison  of  results  with  ours  difficult.  Only  [HSS]  includes  a 
reintegration  procedure. 

The  problem  addressed  in  the  earlier  papers  is  only  that  of  maintaining  synchronization  of  local 
times  once  it  has  been  established.  There  is,  of  course,  the  separate  problem  of  establishing  such 
synchronization  in  the  first  place.  A  variant  of  the  algorithm  in  this  paper  can  be  used  to  establish  the 
initial  synchronization,  as  well  as  to  maintain  the  synchronization.  This  variant,  together  with  a 
description  of  the  interface  between  the  two  algorithms,  will  be  briefly  sketched. 

The  remainder  of  this  paper  is  organized  as  follows:  in  Section  2  we  describe  the  underlying  model 
upon  which  our  work  is  based  in  more  detail,  but  still  informally,  in  Section  3  the  assumptions  we 
make  about  clock  behavior  are  given  and  the  problem  to  be  solved  is  stated  precisely,  in  terms  of  the 
model  described  in  Section  2.  The  algorithm  to  solve  the  problem  is  presented  in  Section  4.  This 
simple  algorithm  is  described  in  words  first,  and  then  in  a  high  level  "programming  language".  We 
explain  how  the  high  level  language  can  be  "compiled"  into  our  model.  Section  5  contains  an 
inductive  proof  that  some  important  properties  hold  at  every  round.  We  give  an  upper  bound  on  the 
amount  by  which  any  nonfaulty  process’  clock  is  changed  at  any  time.  Section  6  includes 
background  needed  for  the  results  of  Section  7,  which  contains  the  answers  to  the  problem  posed 
earlier.  In  section  8  we  explain  how  to  reintegrate  a  repaired  process.  Finally,  Section  9  consists  of  a 
brief  description  of  an  algorithm  to  establish  synchronization  initially. 

2.  A  Model  for  Systems  of  Processes  with  Clocks 

This  section  is  an  informal  description  of  the  model  used  to  describe  a  system  of  processes  which 
have  physical  clocks.  A  completely  formal  development  will  appear  in  [LuJ. 

2.1.  Processes,  Clocks,  and  Systems 

We  model  a  distributed  system  consisting  of  a  set  of  processes  that  communicate  by  sending 
messages  to  each  other.  Each  process  has  a  physical  clock  that  is  not  under  its  control. 

A  typical  message  consists  of  text  and  the  sending  process'  name.  There  are  also  two  special 
messages,  START,  which  comes  from  an  external  source  and  indicates  that  the  recipient  should 
begin  the  algorithm,  and  TIMER,  which  a  process  receives  when  its  physical  clock  has  reached  a 
designated  time. 
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A  process  is  an  automaton  with  a  set  of  states  and  a  transition  function.  The  transition  function 
describes  the  new  state  the  process  enters,  the  messages  it  sends  out,  and  the  timers  it  sets  for  itself, 
all  as  a  function  of  the  process'  current  state,  received  message  and  physical  clock  time.  An 
application  of  the  transition  function  constitutes  a  process  step,  the  only  kind  of  event  in  our  model. 

The  system  is  interrupt-driven  in  that  a  process  only  takes  a  step  when  a  message  arrives.  The 
message  may  come  from  another  process,  or  it  may  be  a  TIMER  message  that  was  sent  by  the 
process  itself.  Thus,  by  using  a  TIMER  message,  a  process  can  ensure  that  an  interrupt  will  occur  at 
a  specified  time  in  the  future.  We  neglect  local  processing  time  by  assuming  that  the  processing  of  an 
arriving  message  is  instantaneous. 

We  define  a  clock  to  be  a  monotonically  increasing,  everywhere  differentiable  function  from  IR  (real 
time)  to  R  (clock  time).  A  system  of  processes  consists  of  a  set  of  processes,  a  subset  of  the 
processes  called  the  self-starting  processes,  and  a  set  of  clocks  (the  physical  clocks),  one  for  each 
process.  The  physical  clock  for  process  p  will  be  denoted  Php. 

2.2.  The  Message  System 

Every  process  can  communicate  directly  with  every  process,  including  itself.  The  message  system 
is  modelled  by  a  global  message  buffer.  When  a  process  sends  a  message  at  real  time  t  to  another 
process,  the  message  is  placed  in  the  message  buffer  together  with  a  time  t’  greater  than  t.  At  real 
time  t’,  the  message  is  received  by  the  proper  recipient  and  is  deleted  from  the  buffer.  The  message 
delay  is  t’  - 1.  Initially  the  message  buffer  contains  no  messages  except  for  START  messages,  exactly 
one  for  each  self-starting  process. 

When  a  process  p  sets  a  timer,  say  for  time  T,  a  TIMER  message  with  recipient  p  and  delivery  time 
Php'^T),  is  placed  in  the  message  buffer,  as  long  as  Php-1(T)  is  not  less  than  the  current  real  time.  If  it 
is,  no  message  is  placed  in  the  buffer. 

2.3.  Executions 

There  is  only  one  type  of  event  in  this  model,  receive(m,p),  the  receipt  of  message  m  by  process  p. 
In  order  to  discuss  how  an  event  affects  the  system  as  a  whole,  we  define  a  configuration  to  consist  of 
a  state  for  each  process  and  a  state  for  the  message  buffer.  An  event  surrounded  by  the 
configurations  of  the  system  immediately  before  the  event  and  immediately  afterwards,  e.g.  (F,e,F’)p  is 
an  action. 

We  define  an  execution  of  the  system  to  be  a  mapping  from  real  times  to  sequences  of  actions  with 
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e  following  properties: 

•  the  configurations  match  up  correctly,  that  is,  the  second  configuration  of  an  action  is  the 
same  as  the  first  one  of  the  following  action; 

•  all  TIMER  messages  received  by  a  particular  process  p  that  arrive  at  real  time  t  are 
ordered  after  any  non-TIMER  messages  for  p  that  arrive  at  real  time  t  (so  messages  that 
arrive  at  the  same  time  as  a  timer  is  due  to  go  off  get  in  "just  under  the  wire"); 

•  if  an  action  (F,  receive(m,p),  F’)  occurs  at  real  time  t,  then  the  only  differences  between  F 
and  F’  are  that  p’s  state  may  change  and  that  the  message  buffer  in  F'  no  longer  contains 
m  but  may  contain  some  messages  and  timers  from  p;  furthermore,  if  p  is  nonfaulty,  then 
its  new  state  and  the  additions  to  the  message  buffer  are  determined  by  p’s  transition 
function  acting  on  p’s  state  in  F,  the  message  m,  and  the  physical  time  Php(t); 

•  if  any  process  p  sets  a  timer  for  a  future  time  t,  then  at  time  t,  p  receives  a  TIMER 
message;  furthermore,  if  any  nonfaulty  process  p  receives  a  TIMER  message  at  time  t, 
then  earlier  p  set  a  timer  for  t;  and 

•  a  message  m  is  received  at  real  time  t  if  and  only  if  the  message  buffer  contained  m  with  t 
recorded  as  the  time  at  which  it  was  to  be  delivered. 


Since  faulty  processes  need  not  obey  the  conditions  in  the  third  and  fourth  properties  listed  above, 
hey  can  choose  when  they  take  steps  and  can  do  anything  they  want  at  a  step. 


3.  The  Clock  Synchronization  Problem 

1.1.  Clocks 

In  this  paper,  clock  names  are  capitalized.  For  each  clock,  the  inverse  function  has  the  same  name, 
>ut  it  is  not  capitalized. 

For  a  very  small  constant  p  >  0,  we  define  a  clock  C  to  be  p-bounded  provided  that  for  all  t 
-  P  <  1/(1  +  P)  <  C’(t)  <  1  +  p  <  1/(1  -p). 

Henceforth  we  assume  that  all  clocks  are  p-bounded,  i.e.,  the  amount  by  which  a  clock’s  rate  is  faster 
ir  slower  than  real  time  is  at  most  p. 


We  give  several  straightforward  lemmas  about  the  behavior  of  (p-  bounded)  clocks. 
Lemma  1 :  Let  C  be  any  clock. 

(a)  If  t,  <  t2,  then 

(1  -  p)(t2  -  tf)  <  (t2-tt)/(1  +p)^C(t2)-C(t1)<(1  +  pJW^tj-t^/O-p). 


(1  -  p)(T2  -  T,)  <  (T2  -  T,)/(1  +  p)  <  c(T2)  -  C(T,)  <  (1  +  p)(T2  -  T,)  <  (T2  - T,)/(1  -  p). 
Proof:  Straightforward.  I 
Lemma  2:  Let  C  and  0  be  clocks. 


(a)  If  C’  =  1  and  T1  <  T2,  then 

l(c(T2)  -dfy)  -  (cfT,)  -  d(T,))|  =  |(c(T2)  -  c(T,))  -  (d(T2)  -  d(T,))l  <  p(T2  -  T,). 

(b)  If  T1  <  T2,  then 

|(c(T2)  -  d(T2))  -  (cd,)  -  d(T,))|  =  |(c(T2)  -  c(T,))  -  (d(T2)  -  d(T,))l  <  2pd2  -  T,). 

(c)  If  C’  =  1  and  t1  <  t2,  then 

|(C(t2)  -  D(t2))  -  (0(1,)  -  D(t,))|  =  |(C(t2)  -  0(1,))  -  (D(t2)  -  D(t,))|  <  p{t2  - t,). 

(d)  If  t,  <  t2,  then 

|(C(t2)  -  D(t2))  -  (C(t,)  -  D(t,))|  =  |(C(t2) -  0(1,))  -  (D(t2)  - D(t,))|  <  2p(t2- 1,). 

Proof:  Straightforward  using  Lemma  1.  I 

Lemma  3:  Let  C  and  D  be  clocks,  T,  <  T,.  Assume  |c(T)  -  d(T)|  <  «  for  all  T,  T,  <  T  <, 
T2.  Lett,  =  minfcd^.dd,)}  and  t2  =  max{c(T2).d(T2)}.  Then  )C(t)  -  D(t)J  <  (1  +  p)a  for 
all  t,  t,  £  t  <  t2. 

Proof:  There  are  four  cases,  which  can  easily  be  shown  to  be  exhaustive. 

Case  1:  c(T,)  <  t  <  c(T2)- 

Let  T3  =  C(t),  so  that  T,  <  T3  <  T2.  By  hypothesis,  |c(T3)  -  d(T3)|  <  a.  Then  |T3  -  D(t)| 
<  (1  +  p)a,  by  Lemma  1. 

Case  2:  d(T,)  <  t  <  d(T2).  This  case  is  analogous  to  the  first. 

Case 3.  c(T2)<t<d(T1)- 
Then  cd,)  <  t  <  d(T,).  So  C(t)  >  D(t),  and  thus 
|C(t)-D(t)|  =  C(t)-D(t)  =  (C(t)-T,)  +  (T,  -  D(t)) 

<(1  +  pJd-cd,))  +  (1  +  p)(d(T,)  -t),  by  Lemma  1, 

=  0  +p)(d(T1)-c(T1))<(1  +  P)«- 
Case  4:  d(T2)  <  t<  c(T,).  This  case  is  analogous  to  the  third.  I 


Each  process  p  has  a  local  variable  CORR,  which  provides  a  correction  to  its  physical  clock  to  yield 
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local"  time.  During  an  execution,  p’s  local  variable  CORR  takes  on  different  values.  Thus,  for  a 
icular  execution,  it  makes  sense  to  define  a  function  CORRp(t),  giving  the  value  of  p’s  variable 
3R  at  time  t. 

>r  a  particular  execution,  we  define  the  local  time  for  p  to  be  the  function  Lp,  which  is  given  by  Php 

;orr  . 

p 

logical  clock  of  p  is  Php  plus  the  value  of  CORRp  at  some  time.  Let  C°p  denote  the  initial  logical 
;k  of  p,  given  by  Php  plus  the  value  of  CORRp  in  p’s  initial  state.  In  keeping  with  our  notational 
ivention,  we  let  c°p  denote  the  inverse  function  of  C°p.  Each  time  p  adjusts  its  CORR  variable,  it 
>  be  thought  of  as  changing  to  a  new  logical  clock.  The  local  time  can  be  thought  of  as  a 
cewise  continuous  function,  each  of  whose  pieces  are  part  of  a  logical  clock. 

!.  Problem  Statement 

le  make  the  following  assumptions: 

1)  All  clocks  are  p-bounded,  including  those  of  faulty  processes.  (Since  faulty  processes  are 
mitted  to  take  arbitrary  steps,  faulty  clocks  would  not  increase  their  power  to  affect  the  behavior  of 
ifaulty  processes.) 

2)  There  are  at  most  f  faulty  processes,  for  a  fixed  constant  f,  and  the  total  number  of  processes  in 
i  system,  n,  is  at  least  3f  +  1 .  (Dolev,  Halpern  and  Strong  [DHS]  show  that  it  is  impossible  without 
:hentication  to  synchronize  clocks  unless  more  than  2/3  of  the  processes  are  nonfaulty.) 

3)  The  message  delay  for  every  message  is  in  the  range  [5  -  e,  8  +  e],  for  some  nonnegative 
istants  S  and  e  with  S  >  e. 

I)  A  START  message  arrives  at  each  process  p  at  time  T°  on  its  initial  logical  clock  C°p,  and  t°p  is 
real  time  when  this  occurs.  Furthermore,  the  initial  logical  clocks  are  closely  synchronized,  i.e., 
p(T°)  -  c°q(T°)|  <  ft,  for  some  fixed  / 3  and  all  nonfaulty  p  and  q. 

/e  let  tmax0  =  maxpnon)auity{,0p}  and  analogously  for  tmin0. 

he  object  is  to  design  an  algorithm  for  which  every  execution  in  which  the  assumptions  above  hold 
isfies  the  following  two  properties. 

1 .  y-Agreement:  |Lp(t)  -  Lq(t)|  <  y,  for  all  t  >  tmin0  and  all  nonfaulty  p,  q. 


2.  (a  ..a-.aJ- Validity:  a.(t  -  tmax0)  +  T°  -  a„  <  L  (t)  <  aJt  -  tmin0)  +  T°  +  «  ,  for  all  t  > 
t°p  and  all  nonfaulty  p!  P 

The  Agreement  property  means  that  all  the  nonfaulty  processes  are  synchronized  to  within  y.  The 
Validity  property  means  that  the  local  time  of  a  nonfaulty  process  increases  in  some  relation  to  real 
time.  We  would,  of  course,  like  to  minimize  ar  a2,  a3,  and  y. 

4.  The  Algorithm 

4.1 .  General  Description 

The  algorithm  executes  in  a  series  of  rounds,  the  i-th  round  for  a  process  triggered  by  its  logical 
clock  reaching  some  value  T1.  (It  will  be  shown  that  the  logical  clocks  reach  this  value  within  real  time 
/?  of  each  other.)  When  any  process  p’s  logical  clock  reaches  T1,  p  broadcasts  a  T1  message. 
Meanwhile,  p  collects  T1  messages  from  as  many  processes  as  it  can,  within  a  particular  bounded 
amount  of  time,  measured  on  its  logical  clock.  The  bounded  amount  of  time  is  of  length  (1  +  p)(/f  + 
5  +  c),  and  is  chosen  to  be  just  large  enough  to  ensure  that  T1  messages  are  received  from  all 
nonfaulty  processes.  After  waiting  this  amount  of  time,  p  averages  the  arrival  times  of  all  the  T* 
messages  received,  using  a  particular  fault-tolerant  averaging  function.  The  resulting  average  is  used 
to  calculate  an  adjustment  to  p’s  correction  variable,  thereby  switching  p  to  a  new  logical  clock. 

The  process  p  then  waits  until  its  new  clock  reaches  time  Tl  + 1  =  T*  +  P,  and  repeats  the 
procedure.  P,  then,  is  the  length  of  a  round  in  local  time. 

The  fault-tolerant  averaging  function  is  derived  from  those  used  in  [DLPSW]  for  reaching 
approximate  agreement.  The  function  is  designed  to  be  immune  to  some  fixed  maximum  number,  f, 
of  faults.  It  first  throws  out  the  f  highest  and  f  lowest  values,  and  then  applies  some  ordinary 
averaging  function  to  the  remaining  values.  In  this  paper,  we  choose  the  midpoint  of  the  range  of  the 
remaining  values,  to  be  specific. 

4.2.  Code  for  an  Arbitrary  Process 

Global  constants:  p,  /J,  5,  e,  and  P,  as  defined  above. 

Local  variables: 

•  CORR,  initially  arbitrary;  correction  variable  which  corrects  physical  time  to  logical  time. 

•  ARR[ql,  initially  arbitrary;  array  containing  the  arrival  times  of  the  most  recent  messages, 
one  entry  for  each  process  q. 


r,  initially  undefined;  local  time  at  which  the  process  next  intends  to  send  a  message, 
entions: 

MOW  stands  for  the  current  logical  clock  time  (i.e.,  the  physical  clock  reading  +  CORR). 

MOW  is  assumed  to  be  set  at  the  beginning  of  a  step,  and  cannot  be  assigned  to. 

REDUCE,  applied  to  an  array,  returns  the  multiset  consisting  of  the  elements  of  the  array, 
with  the  f  highest  and  f  lowest  elements  removed. 

MID,  applied  to  a  multiset  of  reals  numbers,  returns  the  midpoint  of  the  set  of  values  in  the 
multiset. 

istep(u) 

irever 

i  case  T1  messages  are  received  before  this  process  reaches  T1  */ 

while  u  =  (m,q)  for  some  message  m  and  process  q  do 
ARR[q]  :=  NOW 
endstep 
beginstep(u) 
endwh i 1 e 

all  out  of  the  loop  when  u  =  START  or  TIMER;  begin  round  */ 

T  :=  NOW 
broadcasts ) 

set-timer(f  +  (1  +  p)(/?  +  5  +  f)) 

while  u  =  ( m . q )  for  some  message  m  and  process  q  do 
ARRfq]  :=  NOW  ** 
endstep 
beginstep(u) 
endwh i 1 e 

all  out  of  the  loop  when  u  =  TIMER;  end  round  •/ 

AV  :=  mid( reduce(ARR)) 

ADJ  :=  T  +  S  -  AV 
CORR  :=  CORR  +  AOJ 
set-timer(T  +  P) 
endstep 
beginstep(u) 
enddo 

have  employed  a  clean,  simple  notation  for  describing  interrupt-driven  algorithms.  To  translate 
otation  into  the  basic  model,  we  first  assume  that  the  state  of  a  process  consists  of  values  for  all 
cal  variables,  together  with  a  location  counter  which  indicates  the  next  beginstep  statement  to 
ecuted.  The  initial  state  of  a  process  consists  of  the  indicated  initial  values  for  all  the  local 


>sume  that  p  can  awaken  at  an  arbitrary  time  during  an  execution,  perhaps  during  the  middle 
ind.  As  soon  as  it  awakens,  it  begins  collecting  T1  messages  for  all  plausible  values  of  T1.  It  is 
ary  that  p  identify  an  appropriate  round  i  at  which  p  is  able  to  obtain  all  the  T*  messages  from 
Ity  processes.  Since  p  might  awaken  during  the  middle  of  a  round,  p  will  first  orient  itself  by 
mg  the  arriving  messages,  allowing  part  of  a  round  to  pass  before  it  begins  to  collect 
jes.  More  specifically,  p  first  seeks  an  i  such  that  f  T1'1  messages  arrive  within  an  interval  of 
at  most  (1  +  p)(/3  +  2c)  as  measured  on  its  clock.  There  will  always  be  such  an  i  because  all 
ges  from  nonfaulty  processes  for  each  round  arrive  within  /?  +  2c  real  time  of  each  other,  and 
ithinfl  +  p)(y 3  +  2c)  clock  time. 

ming  that  p  itself  is  still  counted  as  one  of  the  faulty  processes,  at  least  one  of  the  f  arriving 
ges  must  be  from  a  nonfaulty  process.  Thus,  p  knows  that  round  i  -  1  is  in  progress  or  has  just 
,  and  that  it  should  use  T1  messages  to  update  its  clock. 

p  continues  to  collect  T'  messages.  It  must  wait  (1  +  p)(Ji  +  2e  +  (1  +  p)(P  +  (1  +  p)(j3  +  e ) 

,  as  measured  on  its  clock,  after  receiving  the  f-th  T1'1  message  in  order  to  guarantee  that  it  has 
ed  T1  messages  from  all  nonfaulty  processes.  The  maximum  amount  of  real  time  p  must  wait,  (/) 
+  (1  +  p)(P  +  (1  +  p)(/3  +  2c)  +  pS),  elapses  if  the  f-th  T1'1  message  is  from  a  nonfaulty 
ss  q  and  it  took  S  -  e  time  to  arrive,  if  q’s  round  i  -  1  lasts  a  long  as  possible,  (1  +  p)(P  +  (1  + 
y  e)  +  pd)  (because  its  clock  is  slow  and  it  adds  the  maximum  amount  to  its  clock),  and  if  there 
unfaulty  process  r  that  is  /}  behind  q  in  reaching  T'  and  its  T  message  to  p  takes  S  +  c.  The 
ss  waits  this  maximum  amount  of  time  multiplied  by  (1  +  p)  to  account  for  a  fast  clock. 

ne  slight  extra  bookkeeping  is  necessary  because  T'  messages  from  nonfaulty  processes  can 
at  p  before  p  has  received  the  f-th  T' 1  message.  We  omit  a  description  of  a  scenario  in  which 
:curs.) 

ediately  after  p  determines  it  has  waited  long  enough,  it  carries  out  the  averaging  procedure 
itermines  a  value  for  its  correction  variable. 

slaim  that  p  reaches  T'  + 1  on  its  new  clock  within  ft  of  every  other  nonfaulty  process.  First, 
ie  that  it  does  not  matter  that  p’s  clock  begins  initially  unsynchronized  with  all  the  other  clocks; 
bitrary  clock  will  be  compensated  for  in  the  subtraction  of  the  average  arrival  time.  Second, 
ie  that  it  does  not  matter  that  p  is  not  sending  out  a  T1  message;  p  is  being  counted  as  one  of  the 
processes,  which  could  always  fail  to  send  a  message.  (Processes  do  not  treat  themselves 
lly  in  our  algorithm,  so  it  does  not  matter  that  p  fails  to  receive  a  message  from  itself.)  Finally, 
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tmin0)  +  T°  +  ie. 

But  then  the  inductive  hypothesis  is  violated,  since  t\  the  time  when  p  receives  q’s  T1 
message,  is  greater  than  or  equal  to  u1'1  ,  the  time  when  q  sets  its  round  i  clock.  I 

Now,  we  can  state  the  validity  condition.  Let  <p  -  (P  -  (1  +  p)(j3  +  e)  -  pS)  /  (1  +  p).  This  is  the 
size  of  the  shortest  round  in  real  time  since  the  amount  of  clock  time  elapsed  during  a  round  is  at  least 

P  minus  the  maximum  adjustment. 

Theorem  20:  The  algorithm  preserves  (o^.a^a^-validity, 

where  =  1  -  p  -  e/<p,  a2  =  1  +  p  +  e/<p,  and  a3  =  e. 

Proof:  We  must  show  for  all  t  >  t°  and  all  nonfaulty  p  that 

a^t-tmax0)  +  T°-a3  <  Lp(t)  <  a2(t-tmin°)  +  T°  +  a3. 

We  know  from  the  preceding  lemma  that  for  i  >  0,  t  >  uMp  (or  t°  ),  and  nonfaulty  p 

(1  -p)(t-tmax°)  +  T°- ie  <  dp(t)  <  (1  +  p)(t-tmin°)  +  T°  +  ie. 

Since  Lp(t)  is  equal  to  dp(t)  for  some  i,  we  just  need  to  convert  i  into  an  expression  in 
terms  of  t,  etc.  An  upper  bound  on  i  is  1  +  (t  -  tmax°)/<p.  Then 

(1  +  p)(t-tmin°)  +  T°  +  ie  <  (1  +  p)(t-tmin°)  +  T°  +  (1  +  (t  -  tmax°)/<p)e 

<  (1  +  p  +  e/«p)(t  -  tmin0)  +  T°  +  e,  since  tmin0  <  tmax0, 

and  that 

(1  -p)(t-tmax°)  +  T°-  ie  >  (1  -p)(t-tmax°)  +  T°-(1  +  (t  -  tmax0)/^)* 

>  (1  -  p  -  e/<p)(t  -  tmax0)  +  T°  -  e. 

The  result  follows.  I 

8.  Reintegrating  a  Failed  Process 

Our  algorithm  can  be  modified  to  allow  a  faulty  process  which  has  been  repaired  to  synchronize  its 
clock  with  the  other  nonfaulty  processes.  Let  p  be  the  process  to  be  reintegrated  into  the  system. 
During  some  round  i,  p  will  gather  messages  from  the  other  processes  and  perform  the  same 
averaging  procedure  described  previously  to  obtain  a  value  for  its  correction  variable  such  that  its 
clock  becomes  synchronized.  Since  p’s  clock  is  now  synchronized,  it  will  reach  T' + 1  within  ft  of  every 
other  nonfaulty  process.  At  that  point,  p  is  no  longer  faulty  and  rejoins  the  main  algorithm,  sending 
out  T  * 1  messages. 
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Validity 

it.  we  show  the  validity  condition.  The  first  lemma  bounds  the  values  of  the  zero-index  clocks. 

Lemma  18:  T°  +  (1  -  p)( t  -  t°p)  <  C°p(t)  <  T°  +  (1  +  p)(t  -  t°p)  for  t  >  t°p. 

Proof:  By  Lemma  1.  I 


3  next  lemma  is  the  main  one. 

Lemma  1 9:  Let  p  be  nonfaulty,  i  >  0.  Then 

(1  -p)(t-tmax°)  +  T°-ie  <  C*  (t)  <  (1  +  p)(t-tmin°)  +  T°  +  ie 

for  all  t  >  uMp  if  i  >  1,  and  for  all  t  >  t°p  if  i  =  0. 

Proof:  We  proceed  by  induction  on  i.  When  proving  the  result  for  i  +  1,  we  will  assume 
the  result  for  i,  for  all  executions  of  the  algorithm  (rather  than  just  the  execution  in 
question). 

Basis:  i  =  0.  This  case  follows  immediately  by  Lemma  18. 

Induction:  Assume  the  result  has  been  shown  for  i  and  show  it  for  i  +  1. 

We  argue  the  right-hand  inequality  first.  The  left-hand  inequality  is  entirely  analogous. 

Assume  in  contradiction  that  we  have  a  particular  execution  in  which  C1  +  1p(t)  >  (1  +  p)(t 
-  tmin0)  +  T°  +  (i  +  i)e  for  some  t  >  u' .  Then  by  the  limitations  on  rates  of  clocks,  it  is 
clear  that  Ci+1p(u'p)>(1  +  p)(up  -  tmin0)  +  T°  +  (i  +  1)e. 

Recall  that  p  resets  its  clock  at  real  time  up,  by  adding  T*  +  6  -  AV*p.  In  this  case,  the 
inductive  hypothesis  implies  that  the  adjustment  must  be  an  increment. 

By  Lemma  5,  this  increment  is  Tf  +  6  -  ARRp(q)  for  some  nonfaulty  q.  Therefore, 

C'pOVp)  +  T1  +  5  -  ARRp(q)  >  (1  +  p)(  u‘p-tmin°)  +  T°  +  (l  +  1)e. 

Next,  we  claim  that  if  p  had  done  the  adjustment  just  when  the  message  arrived  from  q 
rather  than  waiting  till  real  time  u‘  ,  the  bound  would  still  have  been  exceeded.  That  is, 
ARR'  (q)  +  T1  +  S  -  ARRp(q)  >  (1  +  p)(t’  -  tmin0)  +  T°  +  (i  +  1)e,  where  t’  « 
c'p(ARR'p(q)).  (This  again  follows  by  the  limits  on  the  rates  of  clocks.)  Thus, 

T1  +  8  >  (1  +  p)(f  -  tmin0)  +  T°  +  (i  +  1)e. 

Now  consider  an  alternative  execution  of  the  algorithm  in  which  everything  is  exactly  like 
the  one  we  have  been  describing,  except  that  immediately  after  q  sends  out  clock  reading 
T',  q’s  clock  C1  begins  to  move  at  rate  1.  This  change  cannot  affect  p’s  (i+  1)-st  clock 
because  q  doesn’t  send  any  more  messages  until  I1*1,,,  and  these  messages  aren’t 
received  until  after  the  time  when  p  sets  its  (i  +  1)-st  clock. 

By  the  lower  bound  on  message  delays,  q’s  message  to  p  took  at  least  6  -e  time.  Then 
at  real  time  t'  (defined  above),  we  have  C'p(t’)  >  T*  +  8  -  But  then  C*  (t’)  >  (1  +  p)(t’  - 


Theorem  17:  The  Algorithm  guarantees  -^-agreement, 

where  y  =  P  +  e  +  p$p  +  38  +  7e)  +  8  p2(/?  +  5  +  e)  +  4  p3(/?  +  8  +  e). 

Proof:  The  result  for  intervals  in  which  the  processes  use  clocks  with  the  same  indices 
has  been  covered  in  lire  preceding  lemma.  The  expression  in  the  statement  of  that  lemma 
simplifies  to 

P  +  p(3/f  +  26  +  2e)  +  4p2(p  +  5  +  e)  +  2p3()S  +  8  +  e), 
which  is  less  than  y. 

Next,  we  must  consider  the  case  where  one  of  the  processes  has  changed  to  a  new 
clock,  while  the  other  still  retains  the  old  clock.  Consider  |C‘ +  1p(t)  -  C‘q(t)|  for  some  t  with 
u'p  <  t  <  u'q.  Lemma  15  implies  that  there  exist  nonfaulty  processes  r  and  s  such  that 

C^O-a  <Ci  +  1p(t)  <ds(t)  +  a, 

where  a  =  e  +  p(4/J  ♦  8  +  5c)  +  4p2(/3  +  8  +  e)  +  2 p3(p  +  5  +  e). 

\C' +  ’p(t)  -  C'q(t)|  <  a  ♦  max{|C'r(t)  -  C'q(t)|,  |C*s(t)  -  C'q(t)|} 

<  a  +  (1  +  p)  (P  +  2p(1  +  p)(P  +  8  +  c)),  by  the  preceding  lemma 
=  p  +  e  +  p(7/?  +  38  +  It)  +  8 p2(P  +  8  +  e)  +  4p3{/3  +  6  +  e),  as  needed.  I 

Now  we  can  sketch  why  it  is  reasonable  for  p  to  be  approximately  4e  +  4pP,  as  mentioned  at  the 
end  of  Section  5.1.  Assume  P  is  fixed.  The  i-th  clocks  reach  T'  within  p  of  each  other.  After  the 
processes  reset  their  clocks,  the  new  clocks  reach  U1  within  p/2  +  2e  (ignoring  p  terms).  By  the  end 
of  the  round,  the  clocks  reach  Tl  +  1  within  about  p/2  +  2e  +  2pP  of  each  other,  because  of  drift. 
This  quantity  must  be  at  most  p.  The  inequality /J/2  +  2e  +  2pP  <p  yields  P  >  4e  +  4pP. 

Suppose  we  alter  the  algorithm  so  that  during  each  round,  the  processes  exchange  clock  values  k 
times  instead  of  just  once.  Then  we  get  p/2k  +  (4  -  22  k)e  +  2pP  <  p,  which  simplifies  to  p  >  4e  + 
2pP(2k/(2k- 1 )).  It  appears  that  p>  4e  +  2pP  is  approachable. 

If  n  increases  while  f  remains  fixed,  a  greater  closeness  of  synchronization  can  be  achieved  by 
using  the  mean  instead  of  the  midpoint  in  the  algorithm.  Similarly  to  [DLPSW],  we  can  show  that  the 
convergence  rate  if  the  mean  is  used  is  roughly  f/(n-2f),  and  that  an  error  of  approximately  2e  is 
approachable. 


and  for  min{t°p,t°q}  <  t  <  max{u°p,u°q},  if  i  =  0. 

Proof:  Basis:  i  =  0.  Lemma  14  implies  that  |c‘  (T)  -c1  (T)|  <P  +  2p(1  +  p)(/?  +  8  +  e) 
for  all  T,  Uf  1  <  T  <  U*  if  i  >  1  and  for  all  T,  V  ^  T  <  I?  if  i  =  0.  Then  Lemma  3 
immediately  implies  the  needed  result  for  i  =  0. 

Induction:  i  >  t.  Lemma  3  implies  the  result  for  all  t  with 

min{c'p(UM),  c^U1’1)}  <  t  <  maxfu'p,  u'q}. 

It  remains  to  show  the  bound  for  t  with 

max{uMp,uMq}  <  t  <  minfc'pfU1'1),  c'q(UM)}. 

Without  loss  of  generality,  assume  that  c*  (U1'1)  <  c*  (UM),  so  that  the  minimum  is  equal  to 
c'p(UM). 

|c‘p(t)  - dq(t)|  <  |(C‘p(t) - cjq(t))  - (CptCpfU*'1)) - C'q(c'p(UM)))| 

+  |Cip(cip(UM))-Ciq(c'p(UM))| 

The  first  term,  by  Lemma  2,  is  at  most  2p(c* „(UM)  - 1).  Since  t  >  maxfu1'1  .  uM„}  >  uM„ 
we  have  P  P  *  P 

2p(c'p(UM)  - 1)  <  2p(cip(Ui1)  -  cMp(UM)). 

Since  cMp(UM)  =  cp(T)  for  some  T  with  JT -  U1'1!  <  |ADJp(,  this  quantity  is 

<2P|cip(UM)-c'p(T)| 

2p(1  +  p)|UM  -  T|,  by  Lemma  1 

<  2p(1  +  pJlADj'pl 

<  2p(1  +  p)((1  +  p)(P  +  e)  +  pS),  by  Lemma  8. 

To  bound  the  second  term  we  note  that  Lemma  1 1  implies  that 
|c'p(UM)  - cq(UM)|  <  >5/2  +  2e  +  2p(3/?  +  28  +  3e)  +  4 p2{p  +  8  +  e)  «  «, 
and  so  Lemma  3,  with  T1  *  T2  =»  U1'1,  implies  that 
|Cip(cip(Ui1))-clq(clp(UM))|  £  (1  +  p)o. 

The  assumed  lower  bound  on  p  gives  the  result  that 
2p(1  +  p)((1  +  p)(P  +  e)  +  pS )  +  (1  +  p)a  <  (1  +  p)(P  +  2p(1  +  p)(/3  +  6  +  e))  I 


Here  is  the  main  result,  bounding  the  error  in  the  synchronization  at  any  time. 
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C'p(t)  +  T*  +  8  -  ARR'p(q)  <  Ci+1p(t)  <  Cip(t)  +  T1  +  8  -  ARR‘p(r). 

We  show  the  right-hand  inequality  first.  Let  a  =  c'p(ARR'p(r)),  the  real  time  at  which  the 
message  arrives  at  p  from  r.  Thus,  Cp(a)  »  ARR'p(r).  Note  that  C'r(a)  >  T'  +  (1  -  p)(8  -  e). 

Ci+ 1  (t)  <  C1  +  T*  +  8  -  ARR*  (r),  from  above 

p  — ' 1  p  P 

<C‘r(t)  +  Cp(a)  -  C‘r(a)  +  T*  +  8  -  ARRlp(r)  +  (C‘p(t)  -  C*f(t))  -  (C‘p(a)  -  C^a)) 

<  c‘r(t)  +  C'p(a)  -  C'f(a)  +  T'  +  8  -  ARR'p(r)  +  2p(t  -  a),  by  Lemma  2  since  t  >  a 
<df(t)  +  ARR*p(r)  —  T'  —  ( 1  —  p)(8  -  e)  +  T*  +  8-ARR‘p(r)  +  2p(t-a) 

=  C'r(t)  +  e  +  p8  ~pe  +  2p(t-a). 

It  remains  to  bound  t  -  a.  The  worst  case  occurs  when  t  =  umax'.  The  longest  possible 
elapsed  real  time  between  a  particular  nonfaulty  process  reaching  T1  and  U1  on  the  same 
clock  is  (1  +  pfifi  +  8  +  e).  Thus,  umax'  -  tmin1  <  0  +  (1  +  p)2(jS  +  8  +  e).  But  a  > 
tmin*  +  8~e.  Therefore,  t  -  a  <  (i  +  (1  +  p)2(fi  +  8  +  e)-8  +  e 

Thus,  C1  + 1  (t)  <  C'r(t)  +  c  +  p8-pe  +  2p(0  +  (1  +  pf(fi  +  8  +  e)-8  +  e) 

a  Cir(t)  +  e  +  p( 4/3  +  8  +  3e)  +  4p2^3  +  8  +  e)  +  2 p3(/?  +  8  +  e) 

<C*r(t)  +  a. 

For  the  left-hand  inequality,  we  see  that  C'q(t)  -  e  -  pS  -  pt  -  2p(t  -  a)  <  C1  +  p(t),  where  a 
=  c'p(ARR'p(q)).  The  factor  t  -  a  is  bounded  exactly  as  before,  so  that  we  obtain: 

Cq(t)  -  a  <  C1  +  ’pft).  I 

7.  Agreement  and  Validity  Conditions 

We  are  now  ready  to  show  that  the  agreement  and  validity  properties  hold.  The  main  effort  is  in 
restating  bounds  proved  earlier  concerning  the  closeness  in  real  times  when  clocks  reach  the  same 
value,  in  terms  of  the  closeness  of  clock  values  at  the  same  real  time. 

7.1 .  Agreement 

The  first  lemma  implies  that  the  local  times  of  two  nonfaulty  processes  are  close  in  those  intervals 

where  both  use  a  clock  with  the  same  index. 

Lemma  1 6:  Let  p,  q  be  nonfaulty.  Then 

|Cp(t)-C‘q(t)|  <(1  +  p)(0  +  2p(1  +  p)(0  ♦  8  ♦  e)) 

for  max{u‘  ’p.u1  ’p}  <  t  <  maxfu'p.u'p},  if  i  ^  1, 


adjustment  shows  that  ADJ'p  <  (1  +  p)(P  +  e)  +  pS.  Therefore, 

t' +  ’p  -  u'p  >  (3(1  +  p)(/?  +  e)  +  p8  -  (1  +  p)(j8  +  8  +  e)  -  (1  +  p)(/?  +  e)  -  p8)  /  (1  +  p) 

=  p  -  8  +  e,  as  needed.  I 

Thus,  we  have  shown  that  the  three  inductive  hypotheses  hold.  Therefore,  the  claims  made  in  this 
section  for  a  particular  i,  in  fact  hold  for  all  i. 

6.  Some  General  Properties 

In  this  section,  we  state  several  consequences  of  the  results  proved  in  the  preceding  section. 

First,  we  state  a  bound  on  the  closeness  with  which  the  various  clocks  reach  corresponding  values. 

Lemma  14:  Let  p,  q  be  nonfaulty,  i  >  0.  Assume  that  T  is  chosen  so  that  U1'1  <  T  <  U*, 
if  i  >  1,  or  so  that  T°  <  T  <  U°,  if  i  =  0. 

Then  |cp(T)  -  c^T)!  <  /3  +  2p(1  +  p)(/3  +  8  +  «). 

Proof:  Basis:  i  =  0.  Then  T°  <,  T  <  U°. 

|c°p(T)  -  c°q(T)|  <  |(c°p(T)  -  c°q(T))  -  (c°p(T°)  -  c°q(T°))|  +  |c°p(T°)  -  c°q(T°)| 

<  2p(T  -  T°)  +  p,  by  Lemma  2  and  assumption  4 

<  ft  +  2p(1  +  p)(P  +  6  +  e). 

Induction:  i  >  0.  Choose  T  with  UM  <  T  <  U1. 

|c'p(T)  - cq(T)|  <  |(c‘p(T)  - c'q(T))  -  (c‘p(UM) - C,q(U,  ,))|  +  ICpfU11) - c'q(UM)| 

<  2pP  +  p/2  +  2c  +  2p(3P  +  28  +  3e)  +  4p2Q3  +  5  +  e),  by  Lemmas  2  and  11. 

The  upper  bound  on  P  implies  the  result.  I 

Next,  we  prove  a  bound  for  a  nonfaulty  process’  (i  +  1)  st  clock,  in  terms  of  nonfaulty  processes’  i-th 
clocks. 

Lemma  15:  Let  p  be  nonfaulty,  i  >  0.  Then  there  exist  nonfaulty  processes,  q  and  r, 
such  that  for  u'„  <  t  <  umax1, 

C'q(t)  -  a  <  C1  +  ’pft)  <  c'r(t)  +  a 

where  a  =  e  +  p{4p  +  8  +  5e)  +  4 p2{p  +  8  +  e)  +  2p3(/f  +  8  +  e). 

Proof:  Ci  +  1  (t)  =  C1  (t)  +  T'  +  8  -  AVp.  Therefore,  by  Lemma  5  there  are  nonfaulty 
processes,  q  and  r  for  which 


.  .  -  .A 


16 


|c' +  \(U')  -  dp(Ui}|  <  Kc> +  \{U')  -  dp(ui))  -  (ci  +  'jtf  +  ADJ'p)  -  dp(U'  +  ADJ'p))| 

<  p|ADJ'p|,  by  Lemma  2 

<  p((l  +  p)(p  +  e)  +  pS),  by  Lemma  6. 

The  same  bound  holds  for  the  third  term. 

Finally,  consider  the  middle  term,  |d  (U1)  -  d  (u')|.  We  know  that  d  (U*)  =  d  (U*  +  ADJ*  ) 
-  ADJ'p  a  u'p  -  ADJ‘p,  and  similarly  for  q.  q  " 

|dp(U')  -  ^(U1)!  =  Ku'p  -  u‘q)  -  (ADj'p  -  AD j‘q)| 

<  p/2  +  2t  +  2p(2  +  p)(fi  +  5  +  t),  by  Lemma  10. 

Combining  these  three  bounds,  we  get  the  required  bound.  I 


Finally,  we  can  show  the  second  of  our  inductive  properties,  bounding  the  distance  between  times 
when  clocks  reach  T* +  \ 

Lemma  12:  Let  p,  q  be  nonfaulty.  Then  |t,  +  1p-t,  +  1q|  <,p. 

Proof:  |tU1p-t,  +  1q| 

-  |c,  +  1p(Tu,)-cu,q(Tu1)| 


<  l(c‘  +  1p(Ti  +  1)-c,  +  \(V  + 1 ))  -  (c‘ +  ^(U')  -  c' +  +  |c‘ *  ^(U1)  -  c‘ +  1Q(U*)| 

„2, 


<  2p(P  -  (1  +  p)(fl  +  8  +  e))  +  p/2  +  2e  +  2p(3/?  +  28  +  3e)  +  *p  (P  +  8  +  e),  by 
Lemmas  2  and  1 1 . 


The  assumed  upper  bound  on  P  implies  that  this  expression  is  at  most  p.  I 


5.6.  Bound  on  Message  Arrival  Time 

In  this  subsection,  we  show  that  the  third  and  final  inductive  assumption  holds.  That  is,  we  show 
that  messages  arrive  after  the  appropriate  clocks  have  been  set. 

Lemma  1 3:  Let  p  and  q  be  nonfaulty.  Then  ti+ 1  +  8  -  e  >  up. 

Proof:  Since  t*  +  1q  +  8  - 1  >  tl  +  1p-p  +  8  -e,  it  suffices  to  show  that 

ti  +  ,n~ u‘  >P~S  +  «< 

P  p  r 

Now,  ti+1  -  Up  >  (P  -  (1  +  p)(p  +  8  +  e)  -  ADJp)/(1  +  p)  since  the  numerator 
represents  the  smallest  possible  difference  in  the  values  of  the  clock  Ci  +  1p  at  the  two 
given  real  times. 


But  the  lower  bound  on  P  implies  that  P  >  3(1  +  p)(fi  +  e)  +  p6.  Also,  the  bound  on  the 


W  =  {c'^T*):  r  is  nonfaulty}. 

U  and  V  have  size  n  and  W  has  size  n  -  f. 

Letx  =  e  +  p(fi  +  8  +  c). 

Define  an  injection  from  W  to  U  as  follows.  Map  each  element  c‘r(T*)  in  W  to  c*  (Tf)  -  (T1 
+  6)  +  ARR'  (r)  in  U.  Since  Lemma  8  implies  that  j(ARR'  (r)  -  (T*  +  8))  -  (c'r(T')  -  c' (T'))| 

<  e  +  p(fi  +  P8  +  e)  for  all  the  elements  of  W,  dx(W,U)  =  0.  Similarly,  dx(W,V)  =  0. 

Since  any  two  nonfaulty  processes  reach  T'  within  /?  real  time  of  each  other,  diam(W)  = 

/*• 

By  Lemma  23,  |mid(reduce(U))  -  mid(reduce(V))|  <p/2  +  2e  +  2p(/?  +  8  +  e). 

Since  mid(reduce(U))  -  mid(reduce(c'p(T')  -  (T1  +  8)  +  ARR'p))  =  c1  (T1)  -  ADJp,  and 
similarly  mid(reduce(V))  =  c'q(T')  -  ADJ'q,  the  result  follows.  I 

The  next  lemma  is  analogous  to  the  previous  one,  except  that  it  involves  U'  instead  of  T*. 

Lemma  10:  Let  p  and  q  be  nonfaulty.  Then 

Kc^fU1)  -  c'qfU'))  -  (ADJ>p  -  ADj‘q)|  <  (i/2  +  2 £  +  2p( 2  +  p)(0  +  8  +  e). 

Proof:  The  given  expression  is 

<  Kc'pfT1)  -  c'q(T'))  -  (ADj'p  -  AD J'q)|  +  Kc'pfU')  -  cfq(U'))  -  (C^T*)  -  clq(T,))| 

<  p/2  +  2e  +  2 p{p  +  8  +  e)  +  2p(1  +  p)(/3  +  8  +  e),  by  Lemmas  9  and  2. 

This  reduces  to  the  claimed  expression.  I 

Next  we  bound  the  distance  in  real  time  between  two  nonfaulty  processes  switching  to  their  new 
clocks.  It  is  crucial  that  the  distance  between  the  new  clocks  reaching  U1  be  less  than  fi  in  order  to 
accommodate  their  relative  drift  during  the  interval  between  U*  and  Ti+1. 

Lemma  1 1 :  Let  p,  q  be  nonfaulty.  Then 

|ci  +  1p(Ui)-ci  +  1q(Ul)|<0/2  +  2e  +  2p(3/3  +  28  +  30  +  4p2(/3  +  8  +  0- 
Proof:  We  define  idealized  clocks,  D  and  D  ,  as  follows.  Both  have  rate  exactly  1.  Also, 
Dp(u’p)  =  Cl  +  1p(u'p)  =  U1  +  ADJ'p,  ana  similarly  for  q.  Then 

|ci  +  1p(Ui)-cu1q(Ui)|^|ci  +  1p(Ui)-dp(Ui)|  ♦  Idp(U') - dq(Ui)|  +  |dq(Ui)-ci  +  ,q(U')|. 

We  bound  each  of  these  three  terms  separately. 

First,  consider  |c' +  ’pfU')  -  dp(U')|.  Now,  U1  +  ADj'p  =  Dp(u'p)  =  Ci  +  1p(u‘p).  So 


Next  we  show  that  the  second  term,  |c*  (T*)  —  d<T*)|»  isatmostp(/}  +  fi  +  e). 

Case  1:  c'pCr*)  <  a.  So  p  reaches  T*  before  q's  message  arrives. 

Lety  =»  a  -  c'p(T').  Theny  <fi  +  8  +  e. 

Subcase  la:  d(T*)  >  c'p(T').  So  Cp  has  rate  slower  than  real  time. 

Thend(T')-c'  (T1)  is  largest  when  C  goes  at  the  slowest  possible  rate,  1/(1  +  p).  In  this 
case,  d(T')  -  c'(T')  =  y  -  (a  -  d(T’)),  where  a  -  d(T')  =  y/(1  +  p).  Thus,  d(T')  -  c^T1)  =* 
y(i  - 1/(1  +  p))  =  yp/(i  +  p)  <  yp  <  pO?  +  8  +  *)• 

Subcase  lb:  d(T')  <  c'p(T').  So  Cp  has  rate  faster  than  real  time. 

Then  c'  (T')  -  d(T')  is  largest  when  C  goes  at  the  fastest  possible  rate,  1  +  p.  Then 
c'pOVdff)  =  y(1  +  P)-y  =  yp  <  ptf  +  8  +  e). 

Case  2:  c1  (T1)  >  a.  So  p  reaches  T*  after  q’s  message  arrives. 

Let  y  =  cp(T')  -  a.  Then  y  <  fi  -  8  +  e. 

Subcase  2a:  d(T')  >  c'p(T').  So  Cp  has  rate  faster  than  real  time. 

An  argument  similar  to  that  for  case  1b  shows  that  dCT1)  -  cp(T‘)  <  YP  ^  ptf  -  8  +  «). 
which  suffices. 

Subcase  2b:  d(T*)  <  c^fT1).  So  Cp  has  rate  slower  than  real  time. 

An  argument  similar  to  that  for  case  la  shows  that  CpfT1)  -  d(T')  <  yp  <  p(/?  -  5  +  c), 
which  suffices.  I 


In  order  to  prove  the  next  lemma,  we  use  some  results  about  multisets,  which  are  presented  in  the 
Appendix.  This  is  a  key  lemma  because  the  distance  between  the  clocks  is  reduced  from  j8  to  >3/2,  in 
a  rough  sense.  The  halving  is  due  to  the  properties  of  the  fault- tolerant  averaging  function  used  in 
the  algorithm.  Consequently,  the  averaging  function  can  be  considered  the  heart  of  the  algorithm. 
Lemma  9:  Let  p  and  q  be  nonfaulty.  Then 


K°  p(T )  “  c  Q(T  )>  “  <ADJ  p  -  AD J  q)l  <0/2  +  2e  +  2p(fl  +  8  +  e). 


Proof:  We  define  multisets  U,  V,  and  W,  and  show  they  satisfy  the  hypotheses  of  Lemma 
23.  Let 


-  . .  *.  **  ■»  ’  «  '  4 
?  ^  1 


u  -  c'p(T')-(T'  ♦  8)  *  ARR'P, 

V  =  c'lT)  -  (T1  +■  8)  +  ARR',  and 
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The  conclusion  is  immediate. 


5.4.  Timers  Are  Set  in  the  Future 

Earlier,  we  gave  a  lower  bound  on  P  and  described  two  conditions  which  that  bound  was  supposed 
to  guarantee  (that  timers  are  set  in  the  future  and  that  messages  arrive  after  the  appropriate  clocks 
have  been  set).  In  this  subsection,  we  show  that  the  given  bound  on  P  is  sufficient  to  guarantee  that 
the  first  of  these  two  conditions  holds. 

Lemma  7:  Let  p  be  nonfaulty.  Then  U1  +  ADJp  <  T1  *  \ 

Proof:  U1  +  ADj'p  <  Uf  +  (1  +  p)ifi  +  e)  +  p8,  by  Lemma  8 

=  U'  +  (2(1  +  p)(>3  +  e)  +  (1  +  p)8  +  p5)  —  (1  +  p)(fl  +  8  +  e) 

<11*  +  P  —  (1  +  p)(/3  +  8  +  e),  by  the  assumed  lower  bound  on  P 
=  Ti+1.  I 

This  lemma  implies  that  timers  are  set  in  the  future  and  that  tl  + 1  is  defined,  the  first  of  the  three 
inductive  properties  which  we  must  verify. 

5.5.  Bounding  the  Separation  of  Clocks 

Next,  we  prove  several  lemmas  which  lead  to  bounds  on  the  distance  between  the  new  clocks  of 
nonfaulty  processes.  The  first  lemma  gives  an  upper  bound  on  the  error  in  a  process’  estimate  of  the 
difference  in  real  time  between  its  own  clock  and  another  nonfaulty  process'  clock  reaching  T1. 

Lemma  8:  Let  p,  q  and  r  be  nonfaulty.  Then 

l(ARR*p(q)  -  (T1  +  6))-(ciq(Ti)-cip(T'))|<e  +  p(j3  +  8  +  e). 

Proof:  Let  a  be  the  real  time  of  arrival  of  q's  message  at  process  p.  Then  a  is  at  most 
c1  (T*)  +  5  +  e.  Define  a  new  auxiliary  clock,  D,  with  rate  exactly  equal  to  1,  and  such  that 
D(a)  =  Cp(a).  Thus,  ARR'p(q)  **  D(a).  So  the  expression  we  want  to  bound  is  at  most 
equal  to: 

|(D(a)  -  (T<  +  5))  -  (c'q(Tf)  -  d(T'))|  +  |c'p(T')  -  d(T*)|. 

First  we  demonstrate  that  the  first  of  these  two  terms  is  at  most  c . 

(0(a)  —  (T*  +  $)-ciq(Tl)  +  d(T*)| 

<=  |a~  d(T'  +8)-  c'q(T!)  +  d(T,)|,  since  D  has  rate  1 
=  |a~c'q(T‘)  +  T'-(T*  +  8)\ 


.'-W/.'.-.V. 

V  ■■ , 
•- 


•*.  •**  V, 

1 .  • .  •  *  ■  -  >•  i 

V-V.»- 


<  |c  q(T  )  ♦  8  +  t  -  C  q(T )  -  5) 


,*.v. 


(2)  |t  - 1  q|  <  p,  for  all  nonfaulty  p  and  q.  (That  is,  the  separation  of  clocks  is  bounded  by  p.) 

(3)  t*  +  8  -  t  >  uMq,  for  all  nonfaulty  p  and  q,  and  i  >  1.  (That  is,  messages  arrive  after  the 
appropriate  clocks  have  been  set.) 

The  proof  is  by  induction.  For  i  =  0,  (1)  and  (2)  are  true  by  assumption  and  (3)  is  vacuously  true. 

Throughout  the  rest  of  this  section,  we  assume  (1),  (2),  and  (3)  hold  for  i.  We  show  (1),  (2),  and  (3) 
for  i  +  1  after  bounding  the  size  of  the  adjustment  at  each  round. 


k*. 


5.3.  Bounding  the  Adjustment 

In  this  subsection,  we  prove  several  lemmas  leading  up  to  a  bound  on  the  amount  of  adjustment 

made  by  a  nonfaulty  process  to  its  clock,  at  each  time  of  resynchronization. 

Lemma  4:  Let  p  and  q  be  nonfaulty. 


(a)  ARR'p(q)  <  V  +  (1  +  p)(/3  +  5  +  e). 


(b)  If  5  -  e  >  p,  then  ARR'p(q)  >  T‘  +  (1  -  p)(S  -e-p). 


(c)  If  8  -  e  <  p,  then  ARR'p(q)  >  V  -  (1  +  p){p  -8  +  e). 


Proof:  Straightforward  using  Lemma  1.  I 

Lemma  5:  Let  p  be  nonfaulty.  Then  there  exist  nonfaulty  q  and  r  with 


ARR'p(q)  <  AV'p  <  ARR  p(r). 


Proof:  By  throwing  out  the  f  highest  and  f  lowest  values,  the  process  ensures  that  the 
remaining  values  are  in  the  range  of  the  nonfaulty  processes’  values.  I 


We  are  now  able  to  bound  the  adjustment. 

Lemma  6:  Let  p  be  nonfaulty.  Then  jADJ1  |  <  (1  +  p)(/J  +  e)  +  p$. 


Proof:  ADJ1  =  T*  +  5  -  AV1  . 

p  P 


Thus,  for  some  nonfaulty  q  and  r,  Lemma  5  implies  that 


T'  +  5  -  ARRp(q)  <  AOj‘p  <  T*  +  5  -  ARRp(r). 


Then  Lemma  4  implies  that: 


(a)  ADJp  >  T  +  8-(T'  +  (1  +  p)(P  +  8  +  e))  =  ,1  +  p)(p  +  t)-p8. 


(b)  If  8  -  e  >  p,  then  ADJ'p  <  T*  +  5  -  (T*  +  (T  -  p)(5  -e -p))  =  (t  -  p)(p  +  c)  +  pS. 


(c)  If  8  -  e  ^  p,  then  ADJ'p  <  T'  +  5 - (T1  - (1  +  p)[p-8  +  e))  =  (1  +  p)(p  +  e)-p8. 
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P  >  2(  1  +  p)(J 3  +  e)  +  (1  +  p)max{6,/?  +  c}  +  p5,and 

P  <p/4p-e/p-p(0  +  5  +  e)-2/3-8-2e. 

A  required  lower  bound  on/?  is/?  >  At  +  4p(3/?  +  8  +  3e)  +  8 p2(/?  +  8  +  e). 

Any  combination  of  P  and  /?  which  satisfies  these  inequalities  will  work  in  our  algorithm.  If  P  is 
regarded  as  fixed,  then  /?,  the  closeness  of  synchronization  along  the  real  time  axis,  is  roughly  4e  + 
4pP.  This  value  is  obtained  by  solving  the  upper  bound  on  P  for  / 3  and  neglecting  terms  of  order  p. 

5.2.  Notation 

Let  V  =  T°  +  iP  and  U*  =  V  +  (1  +  p)(/3  +  5  +  e),  for  all  i  >  0. 


For  each  i,  every  process  p  broadcasts  T*  at  its  logical  clock  time  T1  (real  time  t'p)  and  sets  a  timer  to 
go  off  when  its  logical  clock  reaches  U'.  When  the  logical  clock  reaches  U*  (at  real  time  u1^,  the 
process  resets  its  CORR  variable,  thereby  switching  to  a  new  logical  clock,  denoted  Ci  +  1  .  Also  at 
real  time  u'p,  the  process  sets  a  timer  for  the  time  on  its  physical  clock  when  the  new  logical  clock 
C' +  ^  reaches  Ti+ 1.  It  is  at  least  theoretically  possible  that  this  new  timer  might  be  set  for  a  time  on 
the  physical  clock  which  has  already  passed.  If  the  timer  is  never  set  in  the  past,  the  process  moves 
through  an  infinite  sequence  of  clocks  C°p,  C1^  etc,  where  C°p  is  in  force  in  the  interval  of  real  time 
(  oo,u°p),  and  each  C'p,  i  >  1 ,  is  in  force  in  the  interval  of  real  time  [u1'  1p,  u'p).  If,  however,  the  timer  is 
set  in  the  past  at  some  up,  then  no  further  timers  arrive  after  that  real  time,  and  no  further 
resynchronizations  occur.  That  is,  C,+ 1  stays  in  force  forever,  and  ujp  and  t*p  are  undefined  for  j  >  i 
+  1. 

Let  tmin'  denote  minpnon,au|ty{fip}'  and  analogously  for  tmax1,  umin1  and  umax1. 

For  p  and  q  nonfaulty,  let  ARR'p(q)  denote  the  time  of  arrival  of  a  T1  message  from  q  to  p,  sent  at  q’s 
clock  time  T',  where  the  arrival  time  is  measured  on  p's  local  clock  C'p.  (We  will  prove  that  C*p  has 
actually  been  set  by  the  time  this  message  arrives.)  Let  AVp  denote  the  value  of  AV  calculated  by  p 
using  the  ARR'p  values,  and  let  ADJ'p  denote  the  corresponding  value  of  ADJ  calculated  by  p.  Thus, 
C* + 1  -  d  +  ADJ'  . 

PM  P 

This  section  is  devoted  to  proving  the  following  three  statements  for  all  i  ^  0: 

(1)  The  real  time  t'  is  defined  for  all  nonfaulty  p.  (That  is,  timers  are  set  in  the  future.) 
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variables,  and  the  location  counter  positioned  at  the  first  beginstep  statement  of  the  program. 

The  transition  function  takes  as  inputs  a  state  of  the  process,  a  message,  and  a  physical  time,  and 
must  return  a  new  state  and  a  collection  of  messages  to  send  and  timers  to  set.  This  is  done  as 
follows.  The  beginstep  statement  is  extracted  from  the  given  state.  The  local  variables  are  initialized 
at  the  values  given  in  the  state.  The  parameter  u  is  set  equal  to  the  message.  The  variable  NOW  is 
initialized  at  the  given  physical  time  +  CORR.  The  program  is  then  run  from  the  given  beginstep 
statement,  just  until  it  reaches  an  endstep  statement.  (If  it  never  reaches  an  endstep  statement,  the 
transition  function  takes  on  a  default  value.)  The  next  beginstep  after  that  endstep,  together  with  the 
new  values  for  all  the  local  variables  resulting  from  running  the  program,  comprise  the  new  state.  The 
messages  sent  are  all  those  which  are  sent  during  the  running  of  the  program,  and  similarly  for  the 
timers.  The  set-timer  statement  takes  an  argument  U  which  represents  a  logical  time.  The 
corresponding  physical  time,  U  -  CORR,  is  the  physical  time  which  is  described  by  the  transition 
function. 

5.  Inductive  Analysis 

Although  the  algorithm  is  fairly  simple,  its  analysis  is  surprisingly  complicated  and  requires  a  long 
series  of  lemmas. 

5.1 .  Bounds  on  the  Parameters 

We  assume  that  the  parameters  p,  5,  and  e  are  fixed,  but  that  we  have  some  freedom  in  our  choice 
of  P  and  /),  subject  to  the  reasonableness  of  our  assumption  that  the  clocks  are  initially  synchronized 
to  within  /T  We  would  like  / 3  to  be  as  small  as  possible,  to  keep  the  clocks  as  closely  synchronized  as 
we  can.  However,  the  smaller  /?  is,  the  smaller  P  must  be  (i.e.,  the  more  frequently  we  must 
synchronize). 

There  is  also  a  lower  bound  on  P.  In  order  for  the  algorithm  to  work  correctly,  we  need  to  have  P 
sufficiently  large  to  ensure  the  following. 

(1)  After  a  nonfaulty  process  p  resets  its  clock,  the  local  time  at  which  p  schedules  its  next 
broadcast  is  greater  than  the  local  time  on  the  new  clock,  at  the  moment  of  reset. 

(2)  A  message  sent  by  a  nonfaulty  process  q  for  a  round  arrives  at  a  nonfaulty  process  p  after  p  has 
already  set  its  clock  for  that  round. 


Sufficient  bounds  on  P  turn  out  to  be: 
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observe  that  it  does  not  matter  that  p  adjusts  its  correction  variable  whenever  it  is  ready  (rather  than 
at  the  time  specified  for  correct  processes  in  the  ordinary  algorithm).  The  adjustment  is  only  the 
addition  of  a  constant,  so  the  (additive)  effect  of  the  change  is  the  same  in  either  case. 

(It  is  also  necessary  to  argue  that  when  p  resets  its  clock,  the  new  clock  has  not  already  reached 
T'  +  1.  We  assume  that  P  is  big  enough  to  ensure  this.  We  haven’t  shown  that  the  lower  bound  on  P 
given  earlier  is  sufficient.) 

9.  Establishing  Synchronization 

In  this  section  we  present  an  algorithm  to  synchronize  clocks  in  a  distributed  system  of  processes, 
assuming  the  clocks  initially  have  arbitrary  values.  The  algorithm  handles  Byzantine  failures  of  the 
processes,  uncertainty  in  the  message  delivery  time,  and  clock  drift.  We  envision  the  processes 
running  this  algorithm  until  the  desired  degree  of  synchronization  is  obtained,  and  then  switching  to 
the  maintenance  algorithm. 

9.1 .  Algorithm 

The  structure  of  the  algorithm  is  similar  to  that  of  the  algorithm  which  maintains  synchronization.  It 
runs  in  rounds.  During  each  round,  the  processes  exchange  clock  values  and  use  the  same  fault- 
tolerant  averaging  function  as  before  to  calculate  the  corrections  to  their  clocks.  However,  each 
round  contains  an  additional  phase,  in  which  the  processes  exchange  messages  to  decide  that  they 
are  ready  to  begin  the  next  round.  A  more  detailed  description  follows. 

Nonfaulty  processes  will  begin  each  round  within  real  time  8  +  3e  of  each  other.  At  the  beginning 
of  each  round,  each  nonfaulty  process  p  broadcasts  its  local  time.  Then  p  waits  a  certain  length  of 
time  guaranteed  to  be  long  enough  for  it  to  receive  a  similar  message  from  each  nonfaulty  process. 
At  the  end  of  this  waiting  interval,  p  calculates  the  adjustment  it  will  make  to  its  clock  at  the  current 
round,  but  does  not  make  the  adjustment  yet. 
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Then  p  waits  a  second  interval  of  time  before  sending  out  additional  messages,  to  make  sure  that 
these  new  messages  are  not  received  before  the  other  nonfaulty  processes  have  reached  the  end  of 
their  first  waiting  intervals.  At  the  end  of  its  second  waiting  interval,  p  broadcasts  a  READY  message 
indicating  that  it  is  ready  to  begin  the  next  round.  However,  if  p  receives  f  +  1  READY  messages 
during  its  second  waiting  interval,  it  terminates  its  second  interval  early,  and  goes  ahead  and 
broadcasts  READY.  As  soon  as  p  receives  n  -  f  READY  messages,  it  updates  the  clock  according  to 
the  adjustment  calculated  earlier,  and  begins  its  next  round  by  broadcasting  its  new  clock  value. 
(This  algorithm  uses  some  ideas  from  [DLS].) 
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It  is  apparent  that  a  process  need  only  keep  clock  differences  for  one  round  at  a  time.  The  waiting 
intervals  are  designed  so  that  during  round  i  a  nonfaulty  process  p  will  not  receive  a  READY  message 
from  another  nonfaulty  process  until  p  has  finished  collecting  round  i  clock  values.  Round  i  +  1 
clock  values  are  not  broadcast  until  after  READY  is  broadcast,  so  p  will  certainly  not  receive  round  i 
+  1  clock  values  until  after  it  has  finished  collecting  round  i  clock  values. 

Let  8’  be  the  maximum  difference  between  nonfaulty  clock  values  at  the  latest  real  time  when  a 
nonfaulty  process  begins  round  i.  Ignoring  terms  of  order  p2,  we  can  bound  Bi+1  in  terms  of  B'  as 
follows: 

Bi  +  1  <  'AB'  +  2e  +  2p(13S  +  43c). 

The  idea  of  the  proof  is  similar  to  the  proof  of  Theorem  17.  Again,  the  fault-tolerant  averaging 
function  used  in  the  algorithm  causes  the  difference  to  be  approximately  halved  at  each  round. 

By  considering  the  limit  of  B'  as  the  round  number  increases  without  bound,  we  can  show  that  the 
algorithm  achieves  a  closeness  of  synchronization  of  about  4e  +  4p(135  +  43e). 

As  for  the  maintenance  algorithm,  if  we  use  the  mean  instead  of  the  midpoint  in  this  algorithm,  we 
can  approach  an  error  of  about  2c  as  n  increases  and  f  remains  fixed. 

9.2.  Determining  the  Number  of  Rounds  1,1 

The  nonfaulty  processes  must  determine  how  rr^ny  rounds  of  this  algorithm  must  be  run  to 
establish  the  desired  degree  of  synchronization  before  switching  to  the  maintenance  algorithm.  The 
basic  idea  is  for  each  nonfaulty  process  p  to  estimate  B°,  and  then  calculate  a  sufficient  number  of 
rounds,  NROUNDSp,  using  the  known  rate  of  convergence.  B°  is  estimated  by  having  p  calculate  an 
overestimate  and  an  underestimate  for  C°q(tmax°)  for  each  q,  and  letting  the  estimated  B°  be  the 
difference  between  the  maximum  overestimate  and  the  minimum  underestimate. 

Now  each  process  does  Byzantine  Agreement  on  the  vector  of  NROUNDS  values,  one  for  each 
process.  The  processes  are  guaranteed  to  have  the  same  vector  at  the  end  of  the  Byzantine 
Agreement  protocol.  Each  process  chooses  the  (f  + 1  )-st  smallest  element  of  the  resulting  vector  as 
the  required  number  of  rounds.  The  justification  is  as  follows:  the  smallest  number  of  rounds 
computed  by  a  nonfaulty  process  will  suffice  to  achieve  the  desired  closeness  of  synchronization. 
Variations  in  the  number  of  rounds  computed  by  different  nonfaulty  processes  are  due  to  spurious 
values  introduced  by  faulty  processes  and  to  different  message  delays.  However,  the  range 
computed  by  any  nonfaulty  process  is  guaranteed  to  include  the  actual  values  of  all  nonfaulty 


processes  at  tmax0,  so  the  range  determined  by  the  process  that  computes  the  smallest  number  of 
rounds  also  includes  all  the  actual  values.  In  order  to  guarantee  that  each  process  chooses  a  number 
of  rounds  that  is  at  least  as  large  as  the  smallest  one  computed  by  a  nonfaulty  process,  it  chooses  the 
(f  +  1)-st  smallest  element  of  the  vector  of  values. 

Any  Byzantine  Agreement  protocol  requires  at  least  f  +  1  rounds.  The  processes  can  execute  this 
algorithm  in  parallel  with  the  clock  synchronization  algorithm,  beginning  at  round  0.  The  clock 
synchronization  algorithm  imposes  a  round  structure  on  the  processes’  communications.  The 
Byzantine  Agreement  algorithm  can  be  executed  using  this  round  structure.  Each,  BA  message  can 
also  include  information  needed  for  the  clock  synchronization  algorithm  (namely,  the  current  clock 
value).  However,  the  processes  will  always  need  to  do  at  least  f  +  2  rounds,  one  to  obtain  the 
estimated  number  pf  rounds  and  f  +  1  for  the  Byzantine  Agreement  algorithm. 

9.3.  Switching  to  the  Maintenance  Algorithm 

After  the  processes  have  done  the  required  number  of  rounds,  say  r,  of  this  algorithm  to  establish 
synchronization,  they  must  begin  the  maintenance  algorithm.  Remember  that  that  algorithm  works  by 

having  each  process  broadcast  its  clock  value  when  its  clock  reaches  T',  for  i  =  0,1,  ....  where  Tl  +  1 

•  '  ’  •  *  •'  f  ' 

=  V  +  P.  Let  T°  be  a  multiple  of  P.  The  processes  should  begin  the  maintenance  algorithm  as  soon 
as  possible  in  order  to  minimize  the  inaccuracy  introducted  by  the  clock  drift. 

It  can  be  shown  that  the  first  multiple  of  P  reached  by  nonfaulty  p’s  clock  after  finishing  the  required 
r  rounds  differs  by  at  most  one  from  the  first  multiple  reached  by  nonfaulty  q’s  clock  after  the  r 
rounds.  When  the  first  multiple  of  P  is  reached,  each  process  broadcasts  its  clock  value  as  in  the 
maintenance  algorithm,  but  doesn't  update  its  clock.  At  the  second  multiple  of  P,  each  process 
begins  the  full  maintenance  algorithm  by  broadcasting  its  clock  value  and  updating  its  clock.  (It  will 
receive  clock  values  from  all  nonfaulty  processes.)  There  will  be  a  lag  of  at  most  one  round  between 
any  two  nonfaulty  processes’  beginning  the  maintenance  algorithm.  Then  /},  the  difference  in  real 
time  between  two  nonfaulty  processes  reaching  T',  can  be  calculated  from  Br,  the  fact  that  all 
processes  begin  the  algorithm  at  most  2P  in  clock  time  after  tmaxr,  and  the  result  of  Lemma  15  that 
clocks  that  are  reset  one  round  early  don't  change  by  too  much.  This  p  will  be  slightly  larger  than  the 
smallest  one  maintainable.  To  shrink  it  back  down,  P  can  be  made  slightly  smaller  than  required  by 
the  maintenance  algorithm. 

Mike  Fischer  has  suggested  using  only  the  algorithm  to  establish  synchronization  and  not  using  the 
maintenance  algorithm  at  all.  Further  work  is  needed  to  investigate  this  idea;  however,  it  may  be 
reasonable  since  both  algorithms  synchronize  to  approximately  At. 
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Appendix 

This  Appendix  consists  of  definitions  and  lemmas  concerning  multisets  needed  for  the  proof  of 
Lemma  9.  These  lemmas  are  analogous  to  some  in  [DLPSW]. 

A  multiset  U  is  a  finite  collection  of  real  numbers  in  which  the  same  number  may  appear  more  than 


once.  The  largest  value  in  U  is  denoted  max(U),  and  the  smallest  value  in  U  is  denoted  min(U).  The 
diameter  of  U,  diam(U),  is  max(U)  -  min(U).  Let  s(U)  be  the  multiset  obtained  by  deleting  one 
occurrence  of  min(U),  and  i(U)  be  the  multiset  obtained  by  deleting  one  occurrence  of  max(U).  If  |U| 
>  2f  +  1 ,  we  define  reduce(U)  to  be  lfsf(U),  the  result  of  removing  the  f  largest  and  f  smallest  elements 
of  U. 

Given  two  multisets  U  and  V  with  |U|  <  |V|,  consider  an  injection  c  mapping  U  to  V.  For  any 
nonnegative  real  number  x,  define  Sjc)  to  be  {u€U:  |u  -  c(u)|  >  x}.  We  define  the  x-distance  between 
U  and  V  to  be  d  (U,V)  =  min  {|S  (c)|).  We  say  c  witnesses  d  (U,V)  if  |S  (c)|  =  d  (U,V).  The  x- 
distance  between  U  and  V  is  the  number  of  elements  of  U  that  cannot  be  matched  up  with  an  element 
of  V  which  is  the  same  to  within  x.  If  |u  -  c(u)|  <  x,  then  we  say  u  and  c(u)  are  x-paired  by  c. 

The  midpoint  of  U,  mid(U),  is  '^[max(U)  +  min(U)]. 

For  any  multiset  U  and  real  number  r,  define  U  +  r  to  be  the  multiset  obtained  by  adding  r  to  every 
element  of  U;  that  is,  U  +  r  =  {u  +  r:  u  €  U}.  It  is  obvious  that  mid  and  reduce  are  invariant  under 
this  operation. 

The  next  lemma  bounds  the  diameter  of  a  reduced  multiset. 

Lemma  21 :  Let  U  and  W  be  multisets  such  that  |U|  =  |W|  =  n  and  dx(U,W)  <  f,  where  n 
>  2f  +  1.  Then  max(reduce(U))  <  max(W)  +  x  and  min(reduce(U))  >  min(W)  -  x. 

Proof:  We  show  the  result  for  max;  a  similar  argument  holds  for  min.  Let  c  witness 
dx(U,W).  Suppose  none  of  the  f  elements  deleted  from  the  high  end  of  U  are  x-paired  with 
elements  of  W  by  c.  Since  d^(W,U)  <  f,  the  remaining  n  -  f  elements  of  U  are  x-paired  with 
elements  of  W  by  c,  and  thus  every  element  of  reduce(U)  is  x-paired  with  an  element  of 
W.  Suppose  max(reduce(U))  is  x-paired  with  w  in  W  by  c.  Then  max(reduce(U))  <w  +  x  ^ 
max(W)  +  x. 

Now  suppose  one  of  the  elements  deleted  from  the  high  end  of  U  is  x-paired  with  an 
element  of  W  by  c.  Let  u  be  the  largest  such,  and  suppose  it  was  paired  with  w  in  W.  Then 
max(reduce(U))  <  u  <  w  +  x  <  max(W)  +  x.  I 

The  next  lemma  shows  that  the  results  of  reducing  two  multisets,  each  of  whose  x-distance  from  a 
third  multiset  is  0,  can't  contain  values  that  are  too  far  apart. 

Lemma  22:  Let  U,  V,  and  W  be  multisets  such  that  |U|  =  |V|  =  n  and  |W|  =  n  -  f,  where 
n  >  3f.  If  dx(W,U)  =  0  and  dx(W,V)  =  0,  then  min(reduce(U))  -  max(reduce(V))  <  2x. 

Proof:  First  we  show  that  there  is  a  w  in  W  such  that  w  is  x-paired  both  with  some  u  in 
reduce(U)  and  with  some  v  in  reduce(V)  by  the  mappings  witnessing  dx(W,U)  and  dx(W,V) 
respectively.  We  know  |reduce(U)|  =  |reduce(V)|  =  n  -  2f  and  |W|  =  n  -  f.  In  order  to 
choose  two  disjoint  subsets  of  size  n  -  2f  from  a  set  of  size  n  -  f,  it  must  be  the  case  that  n  - 
f  >  2(n  -  2f).  But  this  implies  that  n  <  3f,  contradicting  the  hypothesis. 


By  choice  of  u,  v,  and  w,  we  know  that  |u  -  wj  ^  x  and  |v  -  w|  ^  x.  Thus,  min(reduce(U)) 

- max(reduce(V))  u  -  v  ^  w  +  x-(w-x)  =  2x.  I 

* 

Lemma  23  is  the  main  multiset  result.  It  bounds  the  difference  between  the  midpoints  of  two 

reduced  multisets  in  terms  of  a  particular  third  multiset. 

Lemma  23:  Let  U,  V,  and  W  be  multisets  such  that  |U|  =  |V|  =  n  and  |W|  =  n  -  f,  where 
n  >  3f.  If  dx(W,U)  =  0  and  dx(W,V)  =  0,  then  |mld(reduce(U))  -  mid(reduce(V))|  <, 
'/4diam(W)  +  2x. 

P  roof:  |mid(reduce(U))  -  mid(reduce(V))| 

a  lA|max(reduce(U))  +  min(reduce(U))  -  max(reduce(V))  -  min(reduce(V))| 

=  ‘/i|max(reduce(U))  -  min(reduce(V))  +  min(reduce(U))  -  max(reduce(V))| 

If  the  quantity  inside  the  absolute  value  signs  is  nonnegative, 

»  '/4[max(reduce(U))  -  min(reduce(V))  +  min(reduce(U))  -  max(reduce(V))| 

<,  4[max(W)  +  x-(min(W)-x)  +  min(reduce(U))-max(reduce(V))].  by  applying 

Lemma  21  twice 

=  !/4[diam(W)  +  2x  +  min{reduce(U))  -  max(reduce(V))] 

<  V4[diam(W)  +  2x  +  2x],  by  Lemma  22 
=  V4diam(W)  +  2x. 

If  the  quantity  inside  the  absolute  value  is  nonpositive,  then  symmetric  reasoning  gives 
the  result.  I 
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